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Additivity  and  Auditory  Pattern  Analysis 

Robert  A.  Lutfi,  Principal  Investigator 


Project  Summary 

Human  discrimination  of  complex  acoustic  signals  typically  cannot  be  predicted  from  the  simple 
sum  of  the  discriminabilities  associated  with  individual  components  of  the  signal.  Understanding 
such  failures  of  additivity  is  central  to  our  understanding  of  complex  sound  analysis.  The  goal  of 
this  project  is  to  elucidate  the  rules  and  mechanisms  whereby  individual  stimulus  components 
combine  to  influence  the  detection  and  discrimination  of  complex  sounds.  The  project  is  designed  to 
answer  specific  questions  regarding  listeners'  ability  to  integrate  information  within  and  across 
stimulus  dimensions,  to  extract  information  contained  in  the  pattern  of  the  acoustic  signal,  and  to 
perform  under  conditions  of  stimulus  uncertainty.  The  data  are  also  used  to  determine  how  listeners 
weight  the  information  provided  by  different  components  of  the  signal,  and  how  best  to  package  the 
acoustic  information  in  frequency  and/or  time  so  that  it  is  processed  most  effectively  by  the  listener. 
Finally,  w  k  is  undertaken  to  develop  a  computational  model  to  summarize  and  predict  the  results 
of  these  a.  ^  future  experiments. 

Statement  of  Work/Research  Objectives 

Can  the  perception  of  a  complex  event  be  reduced  to  the  sum  of  its  analyzable  elements?  This 
was  one  of  the  fundamental  questions  that  occupied  the  minds  of  the  earliest  thinkers  interested  in 
understanding  human  perception.  Today,  of  course,  we  are  familiar  with  the  Gestaltist's  favorite 
illusions  demonstrating  that  the  perception  of  the  whole  is  often  greater  than  the  sum  of  its  separate 
pans.  By  demonstrating  the  importance  of  the  relations  among  parts,  the  Gestalt  psychologist 
redefined  the  study  of  perception  as  the  study  of  patterns. 

In  contemporary  psychoacoustics,  the  Gestaltist’s  influence  has  been  made  evident  in  pattern 
perception  models  of  pitch  (Goldstein,  1973;  Terhardt,  1974;  Wightman,  1973),  localization 
(Searle,  1982;  Perkins,  Kistler  and  Wightman,  1986),  and  speech  (Stevens  and  Blumstein,  1978). 
Now  there  is  evidence  that  simple  auditory  detection,  as  well,  frequently  involves  an  analysis  of  the 
overall  pattern  of  excitation  produced  by  the  signal  and  masker  (Ahumada  and  Lovell,  1971; 
Ahumada,  Marken,  and  Sandusky,  1975;  Green,  1983;  Green,  and  Kidd,  1983;  Green,  and  Mason, 
1985;  Hall,  Haggard,  and  Fernandes,  1984;  Hanna,  1984;  Leek,  and  Watson,  1984;  Lutfi,  1985, 
1986;  Spiegel,  Picardi,  and  Green,  1981).  The  basic  result  of  the  detection  studies  is  a  failure  of 
additivity;  components  of  the  acoustic  complex  affect  threshold  in  ways  that  are  not  predicted  by 
summing  their  separate  effects.  Failures  of  additivity  impose  severe  constraints  on  our  ability  to 
predict  the  auditory  system's  response  to  complex  stimuli,  like  speech,  from  the  response  to  much 
simpler  inputs.  Thus,  one  of  the  greatest  challenges  confronting  psychoacoustics  in  the  years  ahead 
is  to  understand  the  mechanisms  and  invariances  that  determine  how  stimulus  components  combine 
to  influence  auditory  perception. 

The  present  project  adopts  an  approach  to  this  problem  which  is  both  simple  and  direct.  In  all 
experiments,  the  unit  of  analysis  is  the  discriminability,  as  measured  by  d',  of  single  tone  bursts  that 
differ  (on  average)  in  level.  The  complex  signals  of  these  experiments  are  comprised  of  various 
combinations  of  2  to  13  of  these  tone  bursts  distributed  in  frequency  and/or  time.  On  the  basis  of 
simple  additivity,  the  discriminability  of  the  complex  is  given  by  the  vector  summation  rule, 
^'complex  =  (£d'i2)l/2,  where  d'j  is  the  discriminability  of  the  ith  tone  component  of  the  complex. 
The  vector  summation  rule  thus  provides  the  referent  for  evaluating  the  discriminability  actually 
obtained.  This  simple  approach  is  used  to  address  the  following  specific  questions  regarding  the 
processing  of  complex  sounds: 
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(1)  How  efficiently  can  human  observers  integrate  information  within  and  across  different 
stimulus  dimensions? 

(2)  What  effect  does  uncertainty  along  relevant  and  irrelevant  dimensions  have  on  the  ability 
to  integrate  this  information? 

(3)  How  efficiently  can  observers  extract  information  contained  in  the  pattern  of  level 
variation  across  the  individual  components  of  the  complex? 

(4)  Which  components  of  the  complex  are  weighted  most  heavily  in  the  decision  process? 

(5)  What  is  the  best  way  to  package  the  acoustic  information  in  frequency  and/or  time  so  that 
it  will  be  processed  most  effectively  by  the  observer? 

(6)  What  are  the  mechanisms  underlying  the  discrimination  of  these  complex  sounds?  Can  a 
computational  model  be  developed  to  account  for  the  results? 


Summary  of  Research  Progress 

To  summarize  the  research  progress  made  in  answer  to  these  questions  we  list  all  scientific 
publications  to  accrue  from  the  project  along  with  a  brief  description  of  the  research  findings  in  each 
case.  Further  information  pertaining  to  research  progress  can  be  found  in  these  publications. 

Lutfi,  R.A.  (1988b).  "Complex  interactions  between  pairs  of  forward  maskers,"  Hearing 

Research,  35,  71-78. 

The  present  study  was  conducted  to  determine  to  what  extent  the  combined  effects  of  two  forward 
maskers  can  be  predicted  from  addition  of  their  individual  effects.  The  masker  were  50-Hz  wide 
noise  bands  with  center  frequencies  ranging  from  1.8  to  2.2  kHz.  The  signal  was  a  brief,  2.0-kHz 
tone  burst  When  the  maskers  were  gated  on  and  off  together,  the  combination  produced  sometimes 
more  and  sometimes  less  masking  than  predicted  depending  on  the  particular  pair  and  the  relative 
amounts  of  masking  produced  by  the  individual  maskers  in  the  pair.  The  greatest  discrepancy 
occurred  however,  when  the  masker  pair  was  presented  simultaneously  with  the  signal  or  when  the 
forward  maskers  were  presented  in  sequence.  In  the  latter  case,  the  obtained  threshold  exceeded  the 
predicted  threshold  by  as  much  as  34  dB. 

Lutfi,  R.  A.  (1989).  Informational  processing  of  complex  sound:  I.  Intensity  Discrimination. 

Journal  of  the  Acoustical  Society  of  America,  86,  934-944. 

This  paper  reports  on  some  initial  experiments  using  the  sample  discrimination  paradigm  to 
investigate  normal-hearing  listeners'  ability  to  process  information  in  complex,  nonspeech  sounds. 
An  important  feature  of  the  sample  discrimination  experiment  is  that  the  value  of  the  difference  to  be 
discriminated  randomly  varies  from  trial  to  trial.  It  is  this  variation  that  yields  potential  information. 
In  the  present  study,  listeners  heard  a  pair  of  multi-tone  complexes  (or  sequences)  on  each  trial.  The 
individual  levels  of  the  tones  were  drawn  from  two  normal  distributions  differing  only  in  mean.  The 
listener's  task  was  to  identify  the  sound  having  the  higher  mean  tone  level.  For  an  ideal  observer  in 
these  experiments,  performance  in  d'  grows  as  the  square  root  n,  where  n  is  the  number  of  tones. 
Obtained  d'  grew  more  nearly  as  the  cube  root  of  n  regardless  of  whether  the  tones  were  played 
sequentially  or  simultaneously;  whether  they  were  increased  in  number  from  high  frequencies  to  low 
or  from  low  frequencies  to  high.  A  preliminary  model  is  proposed  in  which  discrimination 
performance  depends  predominantly  on  the  information  content  of  the  sounds  and  is  largely 
independent  of  the  physical  dimensions  along  which  the  sounds  vary.  Information  content  is  defined 
in  terras  of  the  variance  of  the  underlying  stimulus  distributions  and  a  stimulus  equivocation  factor 
which  is  derived  from  the  data.  Based  on  this  model,  transmitted  information  is  estimated  to  be 
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between  1.0  and  2.6  bitSt 


Lutfi,  R.  A.  (1990a).  Informational  processing  of  complex  sound:  n.  Cross-dimensional  analysis. 

Journal  of  the  Acoustical  Society  of  America,  87,  2141-2148. 

A  series  of  experiments  investigated  listeners'  ability  simultaneously  to  process  information  across 
different  acoustic  dimensions.  On  each  trial  the  listener  heard  a  pair  of  brief  n-tone  sequences  (n  =  1  to 
12).  The  frequency,  intensity,  and  duration  of  each  tone  in  the  sequence  varied  randomly  from  trial  to 
trial.  On  average  the  values  of  these  three  parameters  were  greater  for  one  sequence,  the  target,  than  the 
other,  the  nontarget.  The  listener’s  task  was  to  identify  the  target  on  each  trial.  For  an  ideal  observer  in 
this  task  d'  performance  grows  as  the  square  root  n.  Obtained  d'  grew  at  a  rate  slightly  less  than  the 
square  root  of  n.  Close  to  cube  root  of  n  growth  was  observed  when  the  average  difference  occurred  in 
only  one  of  the  three  tone  parameter  values  within  a  block  of  trials.  Although  performance  fell  short  of 
ideal,  optimum  weights  were  consistently  given  to  each  tone  and  each  tone  parameter.  The  results  are 
consistent  with  a  model  in  which  performance  depends  predominandy  on  the  information  content  of  the 
sounds  regardless  of  how  the  information  is  'packaged'  in  the  stimulus.  Transmitted  information  is 
estimated  to  be  0.9-2.0  bits  within  a  single  acoustic  dimension,  2.1-3.0  bits  when  distributed  across 
dimensions. 

Lutfi,  R.  A.  (1990b).  How  much  masking  is  informational  masking?  Journal  of  the  Acoustical 

Society  of  America,  88,  2607-2610. 

Tone-in-noise  masking  experiments  have  long  served  as  a  useful  tool  for  measuring  the  limits  of 
auditory  frequency  selectivity  and  temporal  resolution.  The  general  conclusion  to  be  drawn  from  these 
measures  is  that  tone  detectability  is  largely  determined  by  a  small  portion  of  noise  energy  falling  within 
close  spectral  or  temporal  proximity  to  the  tone  (Fletcher,  1940;  Green  and  Swets,  1966;  Penner  et  al., 
1973).  Another  source  of  masking,  not  often  discussed  in  these  studies,  is  that  resulting  from  the 
uncertainty  associated  with  trial-to-trial  variation  in  the  noise  waveform.  Pollack  (1975)  uses  the  term 
informational  masking  to  describe  this  second  type  of  masking.  The  effects  of  informational  masking 
are  well  documented  for  highly  uncertain  maskers  exceeding  40  dB  in  some  conditions,  even  with 
masker  energy  far  removed  from  the  signal.  Given  the  potential  magnitude  of  these  effects,  it  seems 
reasonable  to  ask  how  much  informational  masking  might  exist  in  more  traditional  experiments  using 
noise.  This  paper  provides  an  estimate  based  on  a  theoretical  analysis  of  many  of  the  existing  data. 
The  conclusion  is  that  22%  of  the  masking  observed  in  many  traditional  tone-in-noise  detection 
experiments  is  due  to  uncertainty  associated  with  trial-to-trial  variation  in  the  noise  waveform. 

Lutfi,  R.  A.  (1991a).  Comment  on  "Analysis  of  weights  in  multiple  observation  tasks"  [J.  Ac  oust. 

Soc.  Am.  86,  1743-1746  (1989)].  Journal  of  the  Acoustical  Society  of  America,  91,  507- 

508. 

Generally,  we  can  identify  two  factors  that  limit  an  individual's  ability  to  process  information  from 
multiple  sources:  internal  noise  associated  with  the  observations,  and  undo  reliance  or  weight  given  to 
particular  observations.  It  is  shown  that  Berg’s  (1989)  proposed  method  for  assessing  weights  is  not 
independent  of  assumptions  regarding  internal  noise. 

Lutfi,  R.  A.  (1991b).  Informational  processing  of  complex  sound:  ID.  Interference.  Journal  of  the 

Acoustical  Society  of  America,  (in  press). 

In  this  study  theoretical  results  from  information  theory  and  detection  theory  are  applied  to  provide  a 
formal  analysis  of  the  interaction  of  target  and  context  uncertainty  on  the  discrimination  of  brief 
multitone  sequences.  The  experiments  employ  a  sample-discrimination  task  in  which  the  performance 

of  an  ideal  observer  is  held  constant  while  the  relative  variability  of  the  target  07  and  context  gq  is 
varied.  Listener  performance  in  these  experiments  was  less  than  ideal  but  increased  monotonically  with 
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